Evaluation of the Impact of Concentration and Extraction Methods on the Targeted Sequencing of Human Viruses from Wastewater

Sequencing human viruses in wastewater is challenging due to their low abundance compared to the total microbial background. This study compared the impact of four virus concentration/extraction methods (Innovaprep, Nanotrap, Promega, and Solids extraction) on probe-capture enrichment for human viruses followed by sequencing. Different concentration/extraction methods yielded distinct virus profiles. Innovaprep ultrafiltration (following solids removal) had the highest sequencing sensitivity and richness, resulting in the successful assembly of several near-complete human virus genomes. However, it was less sensitive in detecting SARS-CoV-2 by digital polymerase chain reaction (dPCR) compared to Promega and Nanotrap. Across all preparation methods, astroviruses and polyomaviruses were the most highly abundant human viruses, and SARS-CoV-2 was rare. These findings suggest that sequencing success can be increased using methods that reduce nontarget nucleic acids in the extract, though the absolute concentration of total extracted nucleic acid, as indicated by Qubit, and targeted viruses, as indicated by dPCR, may not be directly related to targeted sequencing performance. Further, using broadly targeted sequencing panels may capture viral diversity but risks losing signals for specific low-abundance viruses. Overall, this study highlights the importance of aligning wet lab and bioinformatic methods with specific goals when employing probe-capture enrichment for human virus sequencing from wastewater.


Summary:
Total pages: 21 Total supplementary tables: 6 Total supplementary figures: 8 Total supplementary methods: 4 Supplementary Tables: Table S1.All 36 samples' processing includes concentration/extraction, quality control, and raw sequencing data QC trimming/deduplication statistics Table S2.(a) Primers, probes, and cycling parameters for RT-dPCR quantification; (b) Reaction mixtures for 8.5k and 26k 24 wells nanoplates Table S3.dMIQE checklist for RT-dPCR experiments Table S4.GISAID SARS-CoV-2 reference genome accession numbers Table S5.66 virus targets of high public health significance in the Illumina VSP panel Table S6.Comparison of dPCR and sequencing reads-based classification of SARS-CoV-2 and BCoV Supplementary Figures:

Figure S1 .
Figure S1.Schematic description of key steps in each concentration and extraction method Figure S2.Partition fluorescence plots of positive and negative control Figure S3.Clustering of all samples by PCoA plot based on the calculated MASH distance of virus sequences classified by Centrifuge Figure S4.(a) The richness of detected virus species at the species level; (b) The relative abundance of the detected viruses included in the VSP panel Figure S5.(a) Assessment of assembly quality based on N50 and total assembly length; (b) Count of near-complete virus genomes assembled Figure S6.Representative assembly visual inspection by Integrative Genomics Viewer (IGV) Figure S7.Dotplots of assembled putative JC polyomavirus scaffolds with repeated regions at the beginning and the end of the sequence Figure S8.Maximum likelihood phylogenetic tree of assembled JC polyomavirus scaffolds

Figure S1 .
Figure S1.Schematic description of key steps in each concentration and extraction method.

Figure S2 .
Figure S2.Examples of partition fluorescence plots of dPCR positive and negative control for the CDC N1 assay for SARS-CoV-2.

Figure S3 .
Figure S3.Clustering of all samples by PCoA plot based on the calculated MASH distance of virus sequences classified by Centrifuge.PMG_4/26_2 sample exhibited distinct properties as compared to the other two biological replicates.

Figure S4 .
Figure S4.(a) The richness of detected virus species at the species level, encompassing total viruses, human viruses, and those targeted by the Illumina VSP panel.The richness was calculated by counting the total unique taxIDs assigned to species levels in each method; (b) The percentages of the detected viruses included in the VSP panel in total unique reads, filtered by counts exceeding 10 reads.Text in each cell indicates the average read counts assigned to the virus for each sample.

Figure S5 .
Figure S5.(a) Assessment of assemblies from each method.The assembly quality is based on N50 and the total assembly length.(b) Count of near-complete virus genomes assembled after applying strict criteria (>1000 bp, >10 average coverage depth, >70% coverage breadth of the complete genome, >80% identity, >90% alignment/query length, best hits for each scaffold, and matching complete genomes) to both scaffolds and BLASTn results.Samples with no recovered near-complete virus genomes are not shown.

Figure S6 .
Figure S6.Representative assembly visual inspection by Integrative Genomics Viewer (IGV).The reads were mapped to the assembled putative JC polyomavirus scaffolds (NODE_58_length_5177_cov_12.765912||full) from the PMG_426_1 sample.Both coverage and alignment tracks were shown with mismatches.All reads were colored by pair orientation and shaded by mapping quality high.The coverage allele frequency threshold was set to 0.2.Note that the genome is circular, resulting in read pairs with mates mapping to the 5' and 3' ends when viewed linearly.

Figure S7 .
Figure S7.Dotplots of assembled putative JC polyomavirus scaffolds with repeated regions at the beginning and the end of the sequence (see gray box selected in upper right corner).Plot shown represents the sequence of NODE_1332_length_5499_cov_11.746143||full from NT_426_3 sample.

Figure S8 .
Figure S8.Maximum likelihood phylogenetic tree of assembled JC polyomavirus genomes.The tree was generated using the Maximum Likelihood method, and the best model was selected by ModelFinder (iqtree): Tamura-Nei (TN) + F (Empirical base frequency) + R2 (Free rate model with 2 categories).Node support values, indicating the percentage of trees in which associated taxa clustered together, were obtained from 1000 bootstraps.This analysis included 56 unique nucleotide sequences, with 4168 columns in the final alignment.Reference genomes were colored based on subtypes records on NCBI.Scaffolds from wastewater assemblies are indicated with triangles.

Table S1 :
See excel file

Table S4 :
see excel file

Table S5 :
66 targeted viruses included in the Illumina VSP panel.The Illumina website does not provide a complete list of 203 strains under these 66 viruses, and the naming convention is a mix of genus, species, and sub-species levels.To track the detection of targeted viruses by each concentration/extraction method, the 66 virus names were manually checked against the NCBI taxonomy.Exact matches were recorded with virus name and taxID.For names not exactly matching NCBI taxonomy, potentially included viruses were recorded with NCBI names, taxID, and ranks under the Illumina virus names.

Table S6 .
See excel file